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Abstract 

This paper starts by considering the minimization of the Renyi divergence subject to a constraint on 
the total variation distance. Based on the solution of this optimization problem, the exact locus of the 
points (^D{Q\\Pi), D{Q\\P2)) is determined when Pi,P2,Q are arbitrary probability measures which are 
mutually absolutely continuous, and the total variation distance between Pi and P2 is not below a given 
value. It is further shown that all the points of this convex region are attained by probability measures 
which are defined on a binary alphabet. This characterization yields a geometric interpretation of the 
minimal Chernoff information subject to a constraint on the variational distance. 

This paper also derives an exponential upper bound on the performance of binary linear block codes 
(or code ensembles) under maximum-likelihood decoding. Its derivation relies on the Gallager bounding 
technique, and it reproduces the Shulman-Feder bound as a special case. The bound is expressed in terms of 
the Renyi divergence from the normalized distance spectrum of the code (or the average distance spectrum 
of the ensemble) to the binomially distributed distance spectrum of the capacity-achieving ensemble 
of random block codes. This exponential bound provides a quantitative measure of the degradation in 
performance of binary linear block codes (or code ensembles) as a function of the deviation of their 
distance spectra from the binomial distribution. An efficient use of this bound is considered. 
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I. Introduction 

The Renyi divergence, introduced in [30], has been studied so far in various information- 
theoretic contexts (and it has been actually used before it had a name [37]). These include 
generalized cutoff rates and error exponents for hypothesis testing ([1], [6], [38]), guessing 
moments ([2], [9]), source and channel coding error exponents ([2], [12], [22], [27], [37]), 
strong converse theorems for classes of networks [11], strong data processing theorems for 
discrete memoryless channels [28], bounds for joint source-channel coding [41], and one-shot 
bounds for information-theoretic problems [46]. 

In [14], Gilardoni derived a Pinsker-type lower bound on the Renyi divergence Da{P\\Q) 
for a G (0,1). In view of the fact that this lower bound is not tight, especially when the total 
variation distance \P — Q\ is large, this paper starts by considering the minimization of the Renyi 
divergence Da{P\\Q), for an arbitrary a > 0, subject to a given (or minimal) value of the total 
variation distance. Note that the minimization here is taken over all probability measures with 
a total variation distance which is not below a given value; this problem differs from the type 
of problems studied in [3] and [24], in connection to the minimization of the relative entropy 
D{P\\Q) subject to a minimal value of the total variation distance with a fixed probability 
measure Q. The solution of this problem generalizes the problem of minimizing the relative 
entropy D{P\\Q) subject to a given value of the total variation distance where the latter is a 
special case with a = 1 (see [10], [13], [29]). 

One possible way to deal with this problem stems from the fact that the Renyi divergence is a 
one-to-one transformation of the Hellinger divergence J^aiP\\Q) where for a € (0,1) U (1, oo): 

Da{P\\Q) = log (1 + (a - 1) J^a{P\\Q)) (1) 

a — 1 

and J^a{P\\Q) is an /-divergence; since the total variation distance is also an /-divergence, 
this problem can be viewed as a minimization of an /-divergence subject to a constraint on 
another /-divergence. The numerical optimization of an /-divergence subject to simultaneous 
constraints on /j-divergences {i = 1,...,L) was recently studied in [15], where it has been 
shown that it suffices to restrict attention to alphabets of cardinality L -|- 2. In fact, as shown 
in [44, (22)], a binary alphabet suffices if there is a single constraint (i.e., L = 1) which is on 
the total variation distance. In view of ([Hi, the same conclusion also holds when minimizing 
the Renyi divergence subject to a constraint on the total variation distance. To set notation, the 
divergences D{P\\Q), \P — Q\, J^a{P\\Q), Da{P\\Q) are defined at the end of this section. 
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being consistent with the notation in [35] and [45]. 

This paper treats this minimization problem of the Renyi divergence in a different way. We 
first generalize the analysis in [10], which was used for the minimization of the relative entropy 
subject to a constraint on the variational distance, for proving that it suffices to restrict attention 
to probability measures which are defined on a binary alphabet. Furthermore, the continuation 
of the analysis in this paper relies on the Lagrange duality, and a solution of the Karush-Kuhn- 
Tucker (KKT) equations while asserting strong duality for the studied problem. The use of 
Lagrange duality further simplifies the computational task of the studied minimization problem. 

As complementary results to the minimization problem studied in this paper, the reader is 
referred to [35, Section 8] which provides upper bounds on the Renyi divergence Da{P\\Q) for 
an arbitrary a G (0, oo) as a function of either the total variation distance or relative entropy in 
case that the relative information is bounded. 

The solution of the minimization problem of the Renyi divergence, subject to a constraint 
on the total variation distance, provides an elegant way for the characterization of the exact 
locus of the points (^D{Q\\Pi), D{Q\\P 2 )) where Pi and P 2 are probability measures whose 
total variation distance is not below a given value e, and Q is an arbitrary probability measure. 
It is further shown in this paper that all the points of this convex region can be attained by a 
triple of probability measures {Pi,P 2 ,Q) which are defined on a binary alphabet. 

In view of the characterization of the exact locus of these points, a geometric interpretation 
is provided in this paper for the minimal Chernoff information between Pi and P 2 , denoted 
by C{Pi,P 2 ), subject to an e-separation constraint on the variational distance between Pi 
and P 2 . It is demonstrated in the following that the intersection point at the boundary of the 
locus of (^D{Q\\Pi), D{Q\\P 2 )) and the straight line D{Q\\Pi) = D{Q\\P 2 ) is the point whose 
coordinates are equal to the minimal value of C{Pi,P 2 ) under the constraint |Pi — P 2 I > £• The 
reader is referred to [48], which relies on the closed-form expression in [31, Proposition 2] for 
the minimization of the constrained Chernoff information, and which analyzes the problem of 
channel-code detection by a third-party receiver via the likelihood ratio test. In the latter problem, 
a third-party receiver has to detect the channel code used by the transmitter by observing a large 
number of noise-affected codewords; this setup has applications in security or cognitive radios, 
or in link adaptation in some wireless technologies. 

Since the Renyi divergence Da{P\\Q) forms a generalization of the relative entropy D{P\\Q), 
where the latter corresponds to a = 1, the approach suggested in this paper for the characteri- 
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zation of the exact locus of pairs of relative entropies in view of a solution to a minimization 
problem of the Renyi divergence is analogous to the usefulness of complex analysis in solving 
real-valued problems. We consider the analysis of the considered problem as mathematically 
pleasing in its own right. Note, however, that an operational meaning of a special point at the 
boundary of this locus has an operational meaning in view of [48] (see the previous paragraph). 
The studied problem considered here differs from the study in [17] which considered the joint 
range of /-divergences for pairs (rather than triplets) of probability measures. 

The performance analysis of linear codes under maximum-likelihood (ML) decoding is of 
interest for studying the potential performance of these codes under optimal decoding, and for 
the evaluation of the degradation in performance that is incurred by the use of sub-optimal and 
practical decoding algorithms. The reader is referred to [32] which is focused on this topic. 

The second part of this paper derives an exponential upper bound on the performance of ML 
decoded binary linear block codes (or code ensembles). Its derivation relies on the Gallager 
bounding technique (see [32, Chapter 4], [36]), and it reproduces the Shulman-Feder bound [40] 
as a special case. The new exponential bound derived in this paper is expressed in terms of 
the Renyi divergence from the normalized distance spectrum of the code (or average distance 
spectrum of the ensemble) to the binomial distribution which characterizes the average distance 
spectrum of the capacity-achieving ensemble of fully random block codes. This exponential 
bound provides a quantitative measure of the degradation in performance of binary linear block 
codes (or code ensembles) as a function of the deviation of their (average) distance spectra from 
the binomial distribution, and its use is exemplified for an ensemble of turbo-block codes. 

This paper is structured as follows: Section JI] solves the minimization problem for the Renyi 
divergence under a constraint on the total variation distance. Section |III] uses the solution of 
this minimization problem to obtain an exact characterization of the joint range of the relative 
entropies in the considered setting above. Section |IV] provides a new exponential upper bound 
on the block error probability of ML decoded binary linear block codes, which is expressed in 
terms of the Renyi divergence, suggests an efficient way to apply the bound to the performance 
evaluation of binary linear block codes (or code ensembles), and exemplifies its use. Throughout 
this paper, logarithms are to the base e. 

We end this section by introducing the definitions and notation used in this work, which are 
consistent with [35], [45], and are included here for the convenience of the reader. 
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Definitions and Notation 


We assume throughout that the prohahility measures P and Q are defined on a common 
measurable space (A,^), and P Q denotes that P is absolutely continuous with respect 
to Q, namely there is no event T ^ ^ such that P{T) > 0 = Q{P). Let ^ denote the 
Radon-Nikodym derivative (or density) of P with respect to Q. 

Definition 1 (Relative entropy): The relative entropy is given by 

«(F||Q) = /_dPl„g(^). (2, 

Definition 2 (Total variation distance): The total variation distance is given by 


\P-Q\= [ 

Ja 


dP 

dg " ^ 


dQ. 


(3) 


Definition 3 (Hellinger divergence): The Hellinger divergence of order a G (0,1) U (1, oo) is 
given by 


■Afa{P\\Q) = 


1 


a — 1 


'A 


dQ 


dgj 


-1 


(4) 


The analytic extension of rJ^aiP\\Q) at a = 1 yields J^i(P\\Q) = D{P\\Q) (nats). 


Definition 4 (Renyi divergence): The Renyi divergence of order a > 0 is given as follows: 

• If a G (0,1) U (1, oo), then 

0„(P||«) = ^l„g(y^dQ(g)"). (5) 

• If a = 0, then 

Do{P\\Q)= max (6) 

T&.^:P{T) = l \Q{P)J 

• Di{P\\Q) = D(P\\Q) which is the analytic extension of Da{P\\Q) at a = 1. 

• If a = +00 then 

Doo{P\\Q) = log ^esssup ^ (y)^ (7) 

with y ~ g. 

Definition 5 (Chernojf information): The Chernoff information between probability measures 
Pi and P 2 is expressed as follows in terms of the Renyi divergence: 

C(Pi,P2) = maXj{(l - a)Do^{Pi\\P2)] (8) 

and it is the best achievable exponent in the Bayesian probability of error for binary hypothesis 
testing (see, e.g., [5, Theorem 11.9.1]). 
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II. Minimization of the Renyi Divergence with a Constrained Total Variation 

Distance 

In this section, we derive a tight lower hound on the Renyi divergence Da{Pi\\P 2 ) subject 
to an equality constraint on the total variation distance |Pi — P 2 I = c where e G [0,2) is 
fixed; alternatively, it can regarded as a minimization problem under the inequality constraint 
\Pl-P2\ > e. It is first shown that this lower bound is attained for probability measures defined 
on a binary alphabet, and Lagrange duality is used to further simplify the computational task 
of this bound. The special case where a = 1, which is specialized to the minimization of the 
relative entropy subject to a fixed total variation distance, has been studied extensively, and three 
equivalent forms of the solution to this optimization problem were derived in [10], [13], [29]. 

In [14, Corollaries 6 and 9], Gilardoni derived two Pinsker-type lower bounds on the Renyi 
divergence of order a G (0,1), expressed in terms of the total variation distance. Among these 
two bounds, the improved lower bound is given (in nats) by 

Da{P\\Q) > + ^a{l+ ha — ba^)e‘^, VaG(0,1) (9) 

where \P — Q\ = e denotes the total variation distance between P and Q. Note that in the limit 
where e tend to 2 (from below), this lower bound converges to a finite value which is at most 
it is, however, an artifact of the lower bound in view of the next lemma. 

Lemma 1: 

lim inf Da(P\\Q) = 00 , Va>0. (10) 

£t2 P,Q-. \P-Q\=e 


Proof: See Appendix II-AI ■ 

In the following, we derive a tight lower bound which is shown to be achievable by a restriction 
of the probability measures to a binary alphabet. For a > 0, let 




min Da{Pi\\P 2 ), VeG[0,2). 

Pi,P2: |Pi-P2|>£ 


( 11 ) 

( 12 ) 


In the following, we evaluate the function g^. In view of [10, Section 2] which characterizes 
the minimum of the relative entropy in terms of the total variation distance, we first extend the 
argument in [10] to prove the next lemma. 

Lemma 2: For an arbitrary a > 0, the minimization in ([TTI) is attained by probability measures 
which are defined on a binary alphabet. 
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Proof: See Appendix II-BI 

The following proposition enables to calculate ga for an arbitrary positive a. 
Proposition 1: Let a G (0,1) U (1, oo) and e G [0, 2). The function pa in (fTTl) satisfies 
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ga{e)= min d„(p||g) 
p,gG[0,l]: |p-g|>f 


where 


da{p\\q) = 


log^p"^^ " + (1 — p)"(l — 


a — 1 


denotes the binary Renyi divergence. 

Proof: This directly follows from Lemma |2] 
Proposition 2: 


Pi(e) =-log(l - ie^), VeG[0,2) 


and 


52(e) = < 


log(l + e2), ifeG[0,1], 


-log(l - , if e G (1,2). 

Furthermore, for a G (0,1) and e G [0, 2), 


9a(£) = 


a 


1 — a 


5l—a (^)) 


and 


where 


5 a (e) > Cl (a) log -- — + 02(0) 


1-ie 


Cl (a) = min < 1 , 


a 


1 — a 


C2{a) = - 


A log 2 


1 — a 


(13) 

(14) 

■ 

(15) 

(16) 

(17) 

(18) 

(19) 


Proof: See Appendix HI] ■ 

Remark 1: The lower bound on 5(j(') in dll provides another proof of Lemma [T] since it first 
yields that lim £^2 5Q(£) = 00 for a G (0,1); this lemma also holds for a > 1 since Da{P\\Q) 
is monotonically increasing in its order a. 

In the following, we use Lagrange duality to obtain an alternative expression as a solution of 
the minimization problem for g^. Recall that Proposition [T] applies to every a > 0. The following 
enables to simplify considerably the computational task in calculating pa, for a G (0,1). 
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Lemma 3: Let a E (0,1) and e' E (0,1). The function 

^1_^ —("l + — 

fa,6'(q) = — ^ ^ X « - 7 -, Vg E (0,1 - e') 

l + l) -('-*5, 

is strictly monotonically increasing, positive, continuous, and 


( 20 ) 


lim fa,6'{q) = 0, lim fa,s'{q) = +oo. 


( 21 ) 


Proof: See Appendix ITm ■ 

Corollary 1: For a E (0,1) and e' E (0,1), the equation 

fa,e'{q) = ^ (22) 

has a unique solution g E (0,1 — e'). 

Proof: It follows from Lemma |3l and the mean value theorem for continuous functions. 

■ 

Remark 2: Since fa,e' ■ (0,1—^ (Oj oo) is strictly monotonically increasing (see Lemma|3]), 
the numerical calculation of the unique solution of equation (l2^ is easy. 

An alternative simplified form for the optimization problem in Proposition [T] is next provided 
for orders a E (0,1). Hence, Proposition [T] applies to every a > 0, whereas the following is 
restricted to a E (0,1). This, however, proves to he very useful in the next section in terms of 
obtaining a significant reduction in the computational complexity of Qai') where only a E (0,1) 
is of interest therein 

Proposition 3: Let a E (0,1), e E (0,2), and let e' = |. A solution of the minimization 
problem for in Proposition [T] is obtained by calculating the binary Renyi divergence 

da{p\\q) in (1141) while taking the unique solution q E (0,1 — e') of (l22l) . and setting p = q + e'. 

Proof: See Appendix HVl ■ 

In view of Proposition [3l the plots in Figures [T] and [2] provide numerical results. 


*This saving in the computational complexity accelerated the running time of the numerical calculations in our 
computer by two orders of magnitude. 
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e 


Fig. 1. A plot of the minimum of the Renyi divergence Da{Pi\\P 2 ) subject to the constraint |Pi — P 2 I > e where 
e G [0, 2). The curves in this plot correspond to a = 0.25 (thick solid curve), a = 0.50 (thin solid curve), a = 0.75 
(thick dashed curve), and a = 1.00 (thin dashed curve, referring to the relative entropy). 


III. The Locus of {D{Q\\Pi),D{Q\\P2)) With a Constrained Total Variation 

Distance 

In this section, we address the following question: 

Question 1: What is the locus of the points (^D{Q\\Pi), D{Q\\P 2 )) if Pi,P 2 ,Q are arbitrary 
prohahility measures which are mutually absolutely continuous, and |Pi — P 2 I > e for a given 
value s E (0, 2) ? (none of the three probability measures is fixed). 

The present section provides an exact characterization of this locus in view of the solution to 
the minimization problem in Section ini and the following lemma: 

Lemma 4: Let Pi,P 2 ,Q be pairwise mutually absolutely continuous probability measures 
defined on a measurable space (^, ^). Then, for a E (0,1) U (1, 00 ), 

IIP 2 ) = D{Q\\P2) + ^ • D{Q\\Pi) + ^ • D{Q\\Q^) (23) 
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Fig. 2. A plot of the minimum of the Renyi divergence Da{Pi\\P 2 ) of order a = 0.90 subject to the constraint 
|Pi — P 2 I > e € [0,2). The exact minimum (thick solid curve) is compared with the Pinsker-type lower bound in 
[14, Corollary 9] (the thin solid curve), and its weaker version in [14, Corollary 6] (the dashed curve). 


where the prohahility measure Qa is given hy 



Vx € 


(24) 


Proof: See Appendix jV] ■ 

As a corollary of Lemma IH the following tight inequality holds, which is attributed to van 
Erven [7, Lemma 6.6] and Shayevitz [39, Section IV.B.8]). It will be useful for the continuation 
of this section, jointly with the results of Section ini 

Corollary 2: Let Pi <CS> P 2 be mutually absolutely continuous discrete probability measures 
defined on a common set A. If a G (0,1) then 

• D{Q\\Pi) + D{Q\\P2) > Pa(Pi||P2) (25) 
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with equality if and only if, for every x ^ A, 


Q{x) = 


( 26 ) 


is reversed with the same necessary and sufficient condition for an 


For a > 1, inequality 
equality. 

Remark 3: The knowledge of the maximizing prohahility measure in 
characterization of the exact locus which is studied in this section. 


is required for the 


The exact locus of the points [D{Q\\Pi), D{Q\\P 2 )) is determined as follows: let |Pi—^ 2 ! > ^ 
for a fixed e G (0,2), and let a G (0,1) he chosen arhitrarily. By the tight lower hound in 
Section m we have 

Da{Pl\\P 2 ) > 9a{e) (27) 


where ga is expressed in (fTSl) . For a G (0,1) and for a fixed value of e G (0, 2), let p = p* 
and q = q* in (0, 1) he set to achieve the global minimum in ([T3] ) (note that, without loss of 
generality, one can assume that p > q since if (p, q) achieves the minimum in (fTSl) then also 
{1 — p,l — q) achieves the same minimum). Consequently, the lower hound in (l27l) is attained 
hy prohahility measures Pi , P 2 which are defined on a binary alphabef (see Lemma O wifh 

Pi{0)=p* =p*{a,e), Pi(l) = l-p*; 


(28) 


P2{0)=q* = q*{a,e), P2{1) = 1 - q\ 

From Corollary |2] and (|27] ). (|28] ). it follows that for every a G (0,1) 


ga{e) < D{Q\\P 2 ) + • D{Q\\Pi) (29) 


where equality in (l29l) holds if Pi and P 2 are the probability measures in (|2^ which are defined 
on a binary alphabet, and Q is the respective probability measure in (|2^ which is therefore 
also defined on a binary alphabet. Flence, there exists a triple of probability measures Pi, P 2 , Q 
which are defined on a binary alphabet and satisfy (l29l ) with equality, and these probability 
measures are easy to calculate for every a G (0,1) and e G (0, 2). 

Remark 4: Similarly to (l29l) . since |Pi — P 2 I = IP 2 — Pi|, it follows from (l29l ) that 


ga{e)<D{Q\\Pi) + ^-D{Q\\P 2 ). (30) 

By multiplying both sides of (l30b by 2^ and relying on the skew-symmetry property in (ITtI) . 
it follows that (l30l) is equivalent to 

gi-a{e)<D{Q\\P2) + ^-D{Q\\Pi) 
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which is (l29l ) when a G (0,1) is replaced hy 1 — a. Hence, since (l29l ) holds for every a G (0,1), 
there is no additional information in (l30l) . 

Theorem 1: The exact locus of (^D{Q\\Pi),D{Q\\P 2 )) in the setting of Question 1 is the 
convex region whose boundary is the convex envelope of all the straight lines 

D{Q\\P2) + ^-D{Q\\P^)=g^{e), VaG(0,l) (31) 

(i.e., the boundary is the pointwise maximum of the set of straight lines in (ISTT) for a G 
(0,1)). Furthermore, all the points in this convex region, including its boundary, are attained by 
probability measures Pi,P 2 ,Q which are defined on a binary alphabet. 

Proof: Let Pi,P 2 ,Q be arbitrary probability measures which are mutually absolutely con¬ 
tinuous and satisfy the e separation condition for Pi and P 2 in total variation. In view of Corol- 
lary |2] and since by definition Pq,(P i IIP 2 ) > ga{£), it follows that the point (P((5||Pi), P(Q||P 2 )) 
satisfies 

D{Q\\P2) + • D{Q\\Pi) > 5„(e) (32) 

for every a G (0,1); this implies that every such a point is either on or above the convex 
envelope of the parameterized straight lines in (l3TI) . 

We next prove that a point which is below the convex envelope of the lines in (OTI ) cannot 
be achieved under the constraint | Pi — P 2 1 > e. The reason for this claim is because for such a 
point (P((5||Pi), P(Q||P 2 )), there is some a G (0,1) for which 

D{Q\\P2) + ^-D{Q\\Pi)<g^{e) (33) 

Since under the e separation condition for Pi and P 2 in total variation distance, Pq,(Pi||P 2 ) > 
5 a(e), then for such a G (0,1), inequality (1251 ) is violated; in view of Corollary |2j this yields 
that the point is not achievable under the constraint |Pi — P 2 I > e. As an interim conclusion, 
it follows that the exact locus of the achievable points is the set of all points in the plane 
(P(Q||Pi), D{Q\\P 2 )') which are on or above the convex envelope of the parameterized straight 
lines in (l3TI) for a G (0,1). 

The next step aims to show that an arbitrary point which is located at the boundary of this 
region can be obtained by a triplet of probability measures {P^, P 2 ,Q*) which are defined on 
a binary alphabet, and satisfy |P^ — P 2 | = e. To that end, note that every point which is on the 
boundary of this region is a tangent point to one of the straight lines in (OTI ) for some a G (0,1). 
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Accordingly, the proper probability measures P^, P 2 and Q* can be determined as follows for 
a given e G (0, 2): 

a) Find the slope s < 0 of the tangent line at the selected point on the boundary; in view of 
(EB, s = -yields a = G (0,1). 

b) In view of Proposition El determine p\,p 2 G (0,1) such that \p\ — P 2 I = f IIP 2 ) = 

gai^)- Consequently, let P^ and P 2 be the probability measures which are defined on the 
binary alphabet with -P^(O) = p^ and ^ 2 ( 0 ) = P 2 - 

c) The respective probability measure Q* = Q* is calculated from (l26l) . and it is therefore also 
defined on the binary alphabet. 

Finally, we show that every interior point in the achievable region can be attained as well by 
a proper selection of P^, P 2 and Q* which are defined on a binary alphabet. To that end, note 
that every such interior point is located at the boundary of the locus of (^D{Q\\Pi), D{Q\\P 2 )) 
under the constraint |Pi — P 2 I > S with some e G (e,2); this follows from the fact that Pq,(-) 
is a strictly monotonically increasing and continuous function in (0, 2), which tends to infinity 
as we let e tend to 2 (see Lemma [T]). It therefore follows that the suitable triplet of probability 
measures (Pf, P 2 , Q*) can be obtained by the same algorithm used for points on the boundary 
of this region, except for replacing e by the larger value e. 

This concludes the proof by first characterizing the exact locus of points, and then demon¬ 
strating that every point in this convex region (including its boundary) is attained by probability 
measures which are defined on the binary alphabet; the proof is also constructive in the sense of 
providing an algorithm to calculate such probability measures P^^P^, Q* for an arbitrary point 
in this closed and convex region. ■ 

As it is shown in Figure |4l the boundaries of these regions become less curvy as e t 2. 

A Geometric Interpretation of the Minimal Chernoff Information with a Constraint on the 
Variational Distance 

Consider the point in Figure |4] which, in the plane of {^D{Q\\Pi),D{Q\\P 2 )), is the intersection 
of the straight line D{Q\\Pi) = D{Q\\P 2 ) and the boundary of the convex region which is 
characterized in Theorem [T] for an arbitrary e G (0,2). 

In view of the proof of Theorem [T] this intersection point satisfies D{Qa\\Pi) = D{Qa\\P 2 ) 
for some a G (0,1), for Pi,P 2 which are probability measures defined on a binary alphabef 
with |Pi — P 2 I = e, and Qa is given in (l26l ). The equal coordinates of this intersection point 
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Fig. 3. The exact locus of (Z)((3||Pi), Z)((5||P2)) where Pi, P 2 are arbitrary probability measures with |Pi —P 2 I > 1 
with e = 1. The exact locus of these relative entropies includes all the points on and above the convex envelope of 
the straight lines in l l31b . which is the convex and closed region painted in white. 


are therefore equal to the Chemoff information C{Pi,P 2 ) (see [5, Section 11.9]). Due to the 
symmetry of this region with respect to the straight line D{Q\\Pi) = D{Q\\P 2 ) (this follows 
from the symmetry property |Pi — P 2 I = \P 2 — Pi\), the slope of the tangent line to the 
boundary of the convex region at this intersection point is s = — 1 (see Figure IDl. This yields 
that a = and from Proposition ID pQ,(e) = — log(l — |e^). Hence, from (l3TI) with 

a = ^, the equal coordinates of this intersection point are given hy 

D{Q\\Pi) = DiQ\\P 2 ) = -i log(l (34) 

Based on [31, Proposition 2], this value is equal to the minimum of the Chernoff information 
subject to an e separation constraints for Pi and P 2 in total variation distance. We next calculate 
the probability measures P^, P 2 and Q* which attain this intersection point. Eq. (IT3]) with a = ^ 
yields 

-21og(v^+ ^(1 -p)(l - g)) = -log(l - (35) 
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D{Q II P^) [nats] 

Fig. 4. This plot shows the 4 exact loci of {D{Q\\Pi), D{Q\\P 2 )) where Pi, P 2 are arbitrary probability measures 
such that |Pi — P 2 I > e, with e = 1.00, 1.40, 1.80, 1.98, and Q Pi, P 2 is an arbitrary probability measure. The 
exact locus which is above the convex envelope for the respective value of e (painted in white) shrinks as the value 
of e is increased, especially when e is close (from below) to 2. The intersection of the boundary of the exact locus, 
for a given e € [0,2), with the straight line D{Q\\Pi) = D{Q\\P 2 ) (passing through the origin) is at the point 
(—^ log(l —£^), —i log(l —£^)); the equal coordinates of this point are the minimum of the Chemoff information 
subject to a given total variation distance e. 


such that p,q & [0,1] and \p — q\ = §■ A possible solution of this equation is p = 
and q = so the respective prohahility measures Pl,P 2 which are defined on the binary 
alphabet satisfy Pf(0) = and ^ 2 ( 0 ) = consequently, from (|2^ . <5(0) = <5(1) = ^ is 
the equiprobable distribution on the binary alphabet. 

As a byproduct of the characterization of the convex region in Theorem [TJ it follows that the 
straight line D{Q\\Pi) = D{Q\\P 2 ) (in the plane of Figure |4|) intersects the boundary of the 
convex region which is specified in Theorem [1] at the point whose coordinates are equal to the 
minimized Chernoff information subject to the constraint [Pi — T 2 I > £■ The equal coordinates 
of each of the 4 intersection points in Figure |4l which refer to e = 1.00,1.40,1.80,1.98, are 
equal to —^ log(l — \e^) = 0.144,0.337,0.830,1.959 nats, respectively. 
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IV. A Performance Bound for Coded Communications via the Renyi 

Divergence 


A. New Exponential Upper Bound 


This section derives an exponential upper bound on the performance of binary linear block 
codes, expressed in terms of the Renyi divergence. Similarly to [19], [20], [21], [23], [25], [33, 
Section 3.B], [36], [40] and [43], the upper bound in the next theorem quantifies the degradation 
in the performance of block codes under ML decoding in terms of the deviation of their distance 
spectra from the binomially distributed (average) distance spectrum of the capacity-achieving 
ensemble of random block codes. 

Theorem 2: Consider a binary linear block code of length N and rate R = where M 

designates the number of codewords. Let So = 0 and, for f G {1,..., A^}, let S; be the number 
of non-zero codewords of Hamming weight 1. Assume that the transmission of the code takes 
place over a memoryless, binary-input and output-symmetric channel. Then, the block error 
probability under ML decoding satisfies 


-Pe = Pe|0 < exp 


—N sup max 

r>l 0<p'<i 


Po (p',q= (h^)) -p' (rR + 


D 


:(Pv||Qjv) \ 
N ) 


(36) 


where s = s{r) = for r > 1 (with the convention that s = oo for r = 1), Qtv is the 
binomial distribution with parameter | and N independent trials (i.e., Qn{1) = for 

( € {0,1,..., A^}), Pn is the PMF defined by Pn{1) = for Z G {0,..., A^}, i9s(-||-) is fhe 
Renyi divergence of order s (i.e., Ds{P\\Q) = log P{x)^Q{xy~^) where s > 1 here), 
and EQ{p,q) designates the Gallager random coding error exponent in [12, Eq. (5.6.14)]. 

Before proving Theorem |2l we relate this exponential bound to previously reported bounds. 

Remark 5: Note that the loosening of the bound by taking r = 1 and, respectively, s = oo 
gives the upper bound 


Pe = Ppin < exp —N max 

' 0<p'<l 


(a) 


= exp —N I R + 


(b) 


Eo{p',q={hk))-p'[R + 

Doo{Pn\\Qn) \ 


-Z9oo(Pv||QAr)\ 

N 


N 


1 


PNijn 


= exp -N Er i? -h — log max , 

N 0<1<N J 


(U 


1 


= exp —N Er \ R + — log max 


Si 


N 0<i<v e-Af(log2-R) 
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which coincides with the Shulman-Feder hound [40]. Equality (a) follows from the definition 
of the Gallager random coding exponent E^{R) in [12, Eq. (5.6.16)] where the symmetric input 
distribution q = (^, ^) is the optimal input distribution for any memory less, binary-input output- 
symmetric channel, equality (b) follows from the expression of the Renyi divergence of order 
infinity (see, e.g., [8, Theorem 6]), and equality (c) follows from the definition of the PMFs Pn 
and <5AT in Theorem 

Remark 6: The proof of Theorem |2] is based on the framework of the Gallager bounds in 
[32, Chapter 4] and [36]. Specifically, if has an overlap with [36, Appendix A]. Unlike the 
analysis in [36, Appendix A], working with the Renyi divergence of order s > 1, instead of 
the relative entropy as a lower bound (see [36, Eq. (A19)]) reveals a need for an optimization 
of the error exponent, which leads to the error exponent in Theorem |2] Namely, if the value 
of r > 1 is increased then the value of s = > 1 is decreased, and therefore Ds{Pn\\Qn) 

is also decreased (unless it is zero, see [8, Theorem 3]; note that P/v and Qjv do not depend 
on the parameters r and s, so they stay un-affected by varying the values of these parameters). 
The maximization of the error exponent in Theorem |2] aims at finding a proper balance between 
the two summands rR and on the right-hand side of (1^ . while also performing an 

optimization over the second dependent variable p' G [O, i] . 

We proceed now with the proof of Theorem |2] 

Proof: The proof of Theorem |2] is based on the framework of the Gallager bounds in [32, 
Chapter 4] and [36]. Specifically, it relies on [36, Appendix A]. We explain in the following 
how our proof differs from the analysis in [36, Appendix A]. From [36, Eq. (A17)], we have 
that for every p' G [Oj 

rp' 

P,|„<M^>exp(-iVE„(p',,= (i,i))) ^ . (37) 

From this point, we deviate from the analysis in [36, Appendix A]. Since ^ i = 1 where 
r, s > 1, we have 
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= exp 



N 


log [Y,PNiiyQNii) 


l-s 


J=0 


= eyip{p'Ds{PN\\QN)) 


(38) 


where Ds{Pn\\Qn) is the Renyi divergence of order s from Pn to Q^. This enables to refer 
to the Renyi divergence of order s > 1, instead of lower hounding this quantity hy the relative 
entropy, and consequently loosening the hound (see [36, Eq. (A19)]). Note that since the Renyi 
divergence is monotonically increasing in its order (see, e.g., [8, Theorem 3]) and the Renyi 
divergence of order 1 is particularized to the relative entropy, the inequality Ds{Pn\\Qn) > 
D{Pn\\Qn) holds. The comhination of (iTTl) and (l3^ gives 


Pe|o < exjp{NRp'r) exp (^-N Eo(^p',q = (i, exp{p Ds{Pn\\Qn)) 


= exp —N 


Eo{p',g = - p'(rR + 


DsiPN\\QN)\ 
N )\ 


0 < p' < -. (39) 


A maximization of the error exponent in (l39l ) with respect to the parameters r > 1 and p' G [0, 
(recall that s = s(r) = > 1) gives the upper hound in 


B. Application of Theorem |2] 

An efficient use of Theorem |2] for the performance evaluation of binary linear block codes (or 
coee ensembles) is suggested in the following by borrowing a concept of bounding from [23], 
which has been further studied, e.g., in [32], [33], [43], and combining it with the new bound 
in Theorem |2] In order to utilize the Shulman-Feder bound for binary linear block codes in a 
clever way, it has been suggested in [23] to partition the binary linear block code C into two 
subcodes Ci and C 2 where Ci U C 2 = C and Ci H C 2 = {0} is the all-zero codeword. The first 
subcode Ci contains the all-zero codeword and all the codewords of C whose Hamming weights 
I belong to a subset £ C {1,2,...,A^}, while C 2 contains the other codewords of C which have 
Hamming weights of I £ = {1,2,...,A^} \ C, together with the all-zero codeword. From 

the symmetry of the channel, Pe = Pe|o < Pe|o(Ci) + Pe\o{C2) where Pe|o(Ci) and Pe|o(C2) 
designate the conditional MF decoding error probabilities of Ci and C 2 , respectively, given that 
the all-zero codeword is transmitted. Note that although the code C is linear, its two subcodes 
Cl and C 2 are in general non-linear. One can rely on different upper bounds on the conditional 
error probabilities Pg|Q(Ci) and Pg|Q(C 2 )^ i-C., we may bound Pg|Q(Ci) by invoking Theorem |2j 
due to its tightening of the Shulman-Feder bound (see Remark |5]), and also rely on an alternative 
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approach for obtaining an upper bound on Pe|o(C 2 ) (e■g■^ b is possible to rely on the union bound 
with respect to the fixed composition codes of the subcode C 2 ). The idea behind this partitioning 
is to include in the subcode Ci the codewords of all the Hamming weights whose distance 
spectrum is close enough to the binomial distribution Qjy (see Theorem |2]) in the sense that the 
additional term jjj exponent of (l36l) has a marginal effect on the conditional ML 

decoding error probability of the subcode Ci- 

Theorem |2] can be applied as well to ensembles of binary linear block codes. The verify this 
claim, let C be an ensemble of binary linear block codes. The proof of Theorem |2] follows from 
the Duman and Salehi bounding technique [36] which leads to the derivation of [36, Eq. (A.ll)]. 
By taking the expectation on the RHS of [36, Eq. (A.ll)] with respect to the code ensemble C 
and invoking Jensen’s inequality, the same bound holds while Si, as it is defined in Theorem |2j 
is replaced by the expectation Si = [5;] with respect to the code ensemble C. This enables 

to replace Pjy on the RHS of (1^ with Pj\f where 


PNil) = ^^, V/G{0,...,iV}, 


which therefore justifies the generalization of Theorem |2] to code ensembles of binary linear 
block codes. 

As it is exemplified in Section IIV-CI Theorem [2] can be efficiently applied to ensembles of 
turbo-like codes in the same way that it was demonstrated to be efficient in [43]. Similarly to 
Theorem |2j the bound in [43, Theorem 3.1] forms another refinement of the Shulman-Eeder 
bound, and the novelty in the former bound is the obtained tightening of the Shulman-Eeder 
bound via the use of the Renyi divergence. 


C. An Example: Performance Bounds for an Ensemble of Turbo-Block Codes 

We conclude this section by an example which applies this bounding technique to the ensemble 
of uniformly interleaved turbo codes whose two component codes are chosen uniformly at ran¬ 
dom from the ensemble of (1072, 1000) binary systematic linear block codes. The transmission 
of these codes takes place over an additive white Gaussian noise (AWGN) channel, and the 
codes are BPSK modulated and coherently detected. The calculation of the average distance 
spectrum of this ensemble has been performed in [43, Section 5.D], which is required for the 
calculation of the upper bound in (l36l ) where the PD P/v is replaced by its expected value 
over the ensemble (i.e., the normalization of the average distance spectrum by the number of 
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codewords, as it is defined in Theorem |2l). In the following, two upper hounds on the block error 
prohahility are compared under ML decoding: the first one is the tangential-sphere hound (TSB) 
of Herzherg and Poltyrev (see [18], [26], [32, Section 3.2.1]), and the second hound follows from 
the suggested comhination of the union hound and Theorem |2l Note that an optimal partitioning 
has been performed, in a way which is conceptually similar to [43, Algorithm 1], for obtaining 
the tightest bound which is obtained by combining the union bound and Theorem |2] 

A comparison of the two bounds shows an advantage of the latter combined bound over the 
TSB in a similar way to [43, upper plot of Fig. 8] (e.g., providing a gain of about 0.2 dB 
over the TSB for a block error probability of 10“^). Note that the Shulman-Feder bound is 
rather loose in this case due to the significant deviation of the ensemble distance spectrum from 
the binomial distribution at low and high Hamming weights. Furthermore, we note that the 
advantage of the proposed bound over the TSB in this example is consistent with the analysis in 
[26] and [42], demonstrating a gap between the random coding error exponent of Gallager and 
the corresponding error exponents that follow from the TSB and some of its improved versions. 
Recall that the random coding error exponent of Gallager achieves the channel capacity, whereas 
the random coding error exponent that follows from the TSB (or some of its improved variants) 
does not achieve the capacity of a binary-input AWGN channel for BPSK modulated fully 
random block codes, where the gap to capacity is especially pronounced for high coding rates. 
In this example, the rate of the ensemble is 0.8741 bits per channel use. 

Appendix I 

Proofs of Lemmas [I] and [2] 

A. Proof of Lemma\J} 

For a = i, Di{P\\Q) = —2logZ{P,Q) where Z{P,Q) = Yhx \/P{x)Q{x) denotes the 
Bhattacharyya coefficient between the two PDs P, Q. We have 

^i(P||Q) >-log(l-ie2) (1.1) 

where \ P—Q\ = e (see, e.g., [31, Proposition 1]; inequality (11.11) is known in quantum information 
theory with respect to the relation between the trace distance and fidelity [47, Section 9.3]). 
Hence, (10) implies that ([TOl) holds for a = ^. Since Da{P\\Q) is monotonically increasing in 
its order a (see [8, Theorem 3]), it follows that (ITOl) also holds for a > i. Finally, due to the 
skew-symmetry property of Da (see [8, Proposition 2]) where Da{P\\Q) = ( ) Di-a{Q\\P) 
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for a G (0,1), and since the total variation distance is a symmetric measure and > 0 for 
a G (0,1), the satisfiahility of (fTOb for a G (^, 1) yields that it also holds for a G (0, ^). 


B. Proof of Lemma^ 


Let Pi <C P 2 he prohahility measures which are defined on a common measurable space 
Denote hy ^ ^ {1, 2} the mapping given hy 


fix) = 

and let Qi, for f G {1, 2}, he given hy 

Q^{j) = 


if ^ f’ 

2, if ^(x) < 1 


'{x&A-. 4>{x)=j} 


dPi{x), Vf,jG{l,2}. 


Consequently, we have 
dPi 


|Pi -P 2 I = 


'A 


dPs 


(x) - 1 


' {x^A-. <j>(x)=l} 


dP2(x 

dP2 


( 1 . 2 ) 


(x) - dP 2 (x) + j 


1 - ^(x) ) dP 2 (x) 
{x£A: ?i(x)=2} V CL-r2 


= (Qi(1)-Q2(1)) + (Q2(2)-Qi(2)) 

= E \Qiij) - QM 

ie{i,2} 

= |Qi-g2|. (1.3) 


From the data processing theorem for the Renyi divergence (see [8, Theorem 9]), 

Da{Pl\\P2) > DM\\Q2) (1-4) 

where Qi and Q 2 are the prohahility measures which are defined on the binary alphabet (see 
(II.2I )). The lemma follows by combining (11.31 ) and (11.41 ). 


Appendix II 

Proof of Proposition [2] 

Eq. ([T5] ) follows from the equality Di{P\\Q) = —2logZ(P,Q) where Z(P,Q) is the 
Bhattacharyya coefficient between P,Q, and since (see [31, Proposition 1]) 

max Z{P, Q) = \l I — VeG[0, 2). 

P,Q: |P-Q|=£ ^ y 4 ’ 
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To prove (fT^ . note that D 2 {Pi\\P 2 ) = log(l + ^’ 2 )) where 


X\PI\\P2)^ 


dPa 


-1 m 






denotes the -divergence between the prohahility measures Pi and P 2 (which is the Hellinger 
divergence of order 2). One can derive a closed-form expression for §2 hy relying on the closed- 
form solution of a minimization of the x^-divergence x^(Pi||P 2 ) subject to the constraint |Pi — 
P 2 I = e e [0, 2), which is given by (see [29, Eq. (58)]) 

if e G [0,1], 

^ if e G (1,2). 

Eq. (ITT]) follows from the skew-symmetry property of the Renyi divergence [8, Proposition 2]. 

The lower bound on in (IT8] ) follows from (IT3] ). which implies that for a G (0,1) and 

eG [0,2) 

log(^maxp^qg[o,i]: |p-q|>f + (1 -p)“(l - 


9a (e) = 


a — 1 


and, we have 


(II. 1) 


0< max (pV-“ + (l-pr(l-9)'-“) 

p,gg[0,l]; |p-g|>f ^ 

< max + max (1—pPil — q) 

p,(}G[0,l]: |p-g|>f p,(}G[0,l]: |p-g|>f 


1—a 


= 2 max “ 

p,gg[0,l]; |p-g|>f 

= 2 max|(l - ie)“, 

The lower bound on in (IT^ follows from the combination of (III. 11) and (III.21) . 


( 11 . 2 ) 


Appendix III 
Proof of Eemma[3] 


Eor a G (0,1) and e' G (0,1), we have 

/ \ CX. — 1 


lim 1-1- 

9^0+ V q 


= 0, 


lim 1 -|-= -|-oo, 

g^o+ V qj 

(l-eT-' 


lim /a,£'(9) = lim y 

g^o+ g^o+ (1 + I j - (1 - P) 
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and 


lim j 1 - 

\ i — q 


a—1 


= + 00 , 


lim 

Q — 1 


1 - 


1-q 


lim fa,6’iq) = lim 

'?-S-(l-£')” (1 _ e')-o — (1 — 


= 0 , 


= + 00 . 


This proves the two limits in (|2T]) . 

We prove in the following that /«,£'(•) is strictly increasing on the interval [1-^, 1 — e'), and 
we also prove later in this appendix that this function is monotonically increasing on the interval 
(0,1-^]. These two parts of the proof yield that fa,e'i') is strictly monotonically increasing on 
the interval ( 0,1 — e'). The positivity of fa,e' on ( 0,1 — e') follows from the first limit in (|2TI) . 
jointly with the monotonicity of this function which is proved in the following. 

For a proof that fa,e'i') is strictly monotonically increasing on [1-^, 1 —s'), this function 
(see (l 20 l) ') is expressed as follows: 


fa,e'{q) 



Ua{ze'{q)) 


(III.l) 


where 


Q 


Uait) = < 


1-1 




1 —Q 

a ’ 


if t G (0,1) U (1, oo), 
if t = 1. 


( 111 . 2 ) 


(III.3) 


Note that Ua in (IIII.31) was defined to he continuous at f = 1. In order to proceed, we need the 
following two lemmas: 

Lemma III.l: Let e' G (0,1). The function z^' in (IIII.2I) is strictly monotonically increasing 
on (O, and it is strictly monotonically decreasing on 1 — s'). This function is also 

positive on ( 0,1 — s'). 

Proof: Zs'{q) > 0 for (7 G (0,1 — s') since 1 — > 0, and 1 + ^ > 0. In order to prove 

the monotonicity properties of z^/, note that its derivative satisfies the equality 


which is derived hy taking logarithms on both sides of (ITTT.2h . followed hy their differentiation. 
By setting the derivative of z^fq) (with respect to q) to zero, we have q = Since z^fq) > 0 
for q G (0,1 — s'), it follows from (IIII.4I ) that z'^fq) > 0 for g G (O, and z'^,{q) < 0 for 
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q G 1 — s'). Hence, z^' is strictly monotonically increasing on (O, and it is strictly 

monotonically decreasing on [^-^,1 —e'). ■ 

Lemma III.2: Let a G (0,1). The function Ua in (IIII.3b is strictly monotonically decreasing 
and positive on (0, oo). 

Proof: Differentiation of Ua in (IIII.31) gives that for f > 0 


= 


fa 2 _|_ Q, _ 


(III.5) 


Note that ^ 


- 1)2 

— at + a — 1) = — 1), so the derivative is zero at f = 1, it is positive 

if f G (0,1), and it is negative if t G (1, oo). This implies that — af + a — 1 < 0 for every 
t G (0, oo), and it is satisfied with equality if and only if f = 1. From (IIII.51 ). it follows that Ua 
is strictly monotonically decreasing on (0, oo). Since limi_).oo ria(f) = 0 (see (IIII.3I )) and Ua is 
strictly monotonically decreasing on (0, oo) then it is positive on this interval. ■ 

From Lemmas ITTT. 1 1 and ITTT.21 it follows that Ze' is strictly monotonically decreasing and 
positive on 1 — s'), and Ua is strictly monotonically decreasing and positive on (0,oo). 

This therefore implies that the composition Ua{z£'{-)) is strictly monotonically increasing and 
positive on the interval 1 — e'). Hence, from (IIII.lll . since /a,e'(-) is expressed as a product 
of two positive and strictly monotonically increasing functions on 1 — e'), also fa,e' has 

these properties on this interval. This completes the first part of the proof where we show that 
fa,e'{') is Strictly monotonically increasing and positive on 1 — e'). 

We prove in the following that fa,e'{') is also strictly monotonically increasing and positive 
on (O, For this purpose, the function is expressed in the following alternative way: 

l+i 

^ /. =.' \ a I — I - 


fa,e'iq) = 


-^ 

1-^ I 1 + ^ 


1 -- 


a—1 


1 _ / ^ 


i+- 


= 1 - 


l-q 


-1 


where z^' is defined in (ITTT.2II . and 


Tail) = 


l-t“ : 

1 —Q 


^{zefq)) 

if f G (0,oo) \ {!}, 
if f = 1. 


(IIL6) 


(111.7) 


Note that it follows from Lemma ITTT. 1 1 and (ITTT.2II that 


Ze'{q) < Zs 


1-e' 


1-e' 
1 + e' 


< 1 
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SO the composition ra{ze'{-)) in (1111.6b is independent of ra{l)', the value of ra{l) is defined 
in (1111.7b to obtain the continuity of r^, which leads to the following lemma: 

Lemma III.3: For a E (0,1), the function Tq, in (1111.7b is strictly monotonically increasing 
and positive on (0, oo). 

Proof: A differentiation of in (IIII.7b gives 

, , , (1 — — 1 

'■«(*) = - frrrfi - (HI'S) 

SO the sign of is the same as of (1 — — 1. Since a E (0,1), and 

^((1 — a)t°‘ + ~ l) = — 1) 

it follows that the last derivative is negative for t E (0,1), zero at f = 1, and positive for 
t E (1, oo). This implies that f = 1 is a global minimum of the numerator of (see (Illl.Sb l. so 

(1 — Q;)f“ + a t°‘~^ — 1 > 0, V f E (0, oo) 


and equality holds if and only if f = 1. It therefore follows from (1111.8b that r'^{t) > 0 for 
t E (0, oo)\{l}, so Taf) is strictly monotonically increasing on (0,oo). Since lim^^o^aCO = 0, 
the monotonicity of Vaf) on (0, oo) yields that it is positive on this interval. ■ 

From Lemmas ITTT. 1 1 and |III. 3 1 Ze' is strictly monotonically increasing and positive on (O, ^ 

and Va is strictly monotonically increasing and positive on (0,oo). This implies that the com¬ 
position ra{zs'{-)) is strictly monotonically increasing and positive on the interval (O, 

From (IIII.6b . fa,£' is expressed as a product of two strictly increasing and positive functions on 
the interval (O, which implies that /«,£'(•) also has these properties on this interval. This 

completes the second part of the proof where we show that /«,£'(•) is strictly monotonically 
increasing and positive on (O, ■ The combination of the two parts of this proof completes 

the proof of Lemma [3] 


Appendix IV 
Proof of Proposition [3] 

The proof relies on the Lagrange duality and KKT conditions, where strong duality is first 
asserted by verifying the satisfiability of Slater’s condition. 
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Let a E (0,1), e E (0,2), and e' = |. Solving (fTSl) is equivalent to solving the optimization 
problem 

maximize + (1 — p)"(l — 

subject to (IV. 1) 

G [0,1], 

< 

^ \p-q\>e' 

where p, q are the optimization variables. The objective function of the optimization problem 
(IIV. 1 1) is concave for a E (0,1), so this maximization problem is a convex optimization problem. 
Since the problem is also sttictly feasible at an interior point of the domain in (IIV.111 . Slater’s 
condition yields that sttong duality holds for this optimization problem (see [4, Section 5.2.3]). 
Note that the replacement of p, q with 1—p and 1 — q, respectively, does not affect the value of 
the objective function and the satisfiability of the constraints in (IIV.11 1. Consequently, it can be 

assumed with loss of generality that p> q\ together with the inequality constraint \p — q\ > e', 

it gives that p — q > e'. The Lagrangian of the dual problem is given by 

L{p, q, A) = + (1 - p)"(i - g)^"" + X{q-p + s') 

and the KKT conditions lead to the following set of equations: 

^ - (1 -p)«-i(l - g)i-“] - A = 0, 

< H = (1 - a) [p°‘q-°‘ - (1 - p)"(l - g)""] + A = 0, (IV.2) 

^ = q-p + e' = 0. 

Eliminating A from the first equation in (IIV.2I) . and substituting it into the second equation gives 



From the third equation of (IIV.2II . Substituting p = q + s' into (IIV.31) . and re-arranging terms 
gives the equation fa,£’{q) = where fa^e’ is the function in (l20l) . 
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Appendix V 
Proof of Lemma H] 

For a € (0,oo) \ {1}, the following equalities hold: 


27 


D{Q\\P2) + • D{Q\\P,) + ^ . DiQWQ^) 


1 

a — 1 


dQ log 




1 
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dQ log 


dQ 

dPi 
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a — 1 


dQ log 



= r [ dQ{x) log 

J A 




‘=^P«(Pl||P2) 


where (a) follows from the equality 

Da{P\\Q) = log 

a — 1 



(V.l) 


where R is an arbitrary prohahility measure such that P,Q R\ (h) holds since Pi,P 2 ,Q are 
mutually absolutely continuous which also yields that Q <CS> Qa (in view of (l24l) ). (c) follows 
from (l24l) . (d) holds since Q is a probability measure, and (e) follows from (IV.lb (recall that 


Q<.Pi,P2). 
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